'Electcomponents

Overview

Dataset statistics

Number of variables14
Number of observations1912
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory209.2 KiB
Average record size in memory112.1 B

Variable types

NUM9
CAT4
BOOL1

Warnings

width has constant value "1912" Constant
height has constant value "1912" Constant
counting has constant value "1912" Constant
filename has a high cardinality: 1031 distinct values High cardinality
area is highly correlated with width_obj and 2 other fieldsHigh correlation
width_obj is highly correlated with area and 1 other fieldsHigh correlation
height_obj is highly correlated with area and 1 other fieldsHigh correlation
scale is highly correlated with width_obj and 2 other fieldsHigh correlation
filename is uniformly distributed Uniform

Reproduction

Analysis started2021-02-11 02:10:48.264261
Analysis finished2021-02-11 02:11:08.185775
Duration19.92 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

filename
Categorical

HIGH CARDINALITY
UNIFORM

Distinct1031
Distinct (%)53.9%
Missing0
Missing (%)0.0%
Memory size14.9 KiB
DJI_0029_1.jpg
 
5
DJI_601_2.jpg
 
4
DJI_03300_3.jpg
 
4
DJI_600_2.jpg
 
4
DJI_02800_3_1.jpg
 
4
Other values (1026)
1891 
ValueCountFrequency (%) 
DJI_0029_1.jpg50.3%
 
DJI_601_2.jpg40.2%
 
DJI_03300_3.jpg40.2%
 
DJI_600_2.jpg40.2%
 
DJI_02800_3_1.jpg40.2%
 
DJI_722_2.jpg40.2%
 
DJI_580_5.jpg40.2%
 
DJI_568_4.jpg40.2%
 
DJI_0013_3000-1000.jpg40.2%
 
DJI_568_8.jpg40.2%
 
Other values (1021)187197.9%
 
Frequencies of value counts

Unique

Unique426 ?
Unique (%)22.3%
Histogram of lengths of the category

Length

Max length65
Median length13
Mean length19.70502092
Min length13

width
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size14.9 KiB
600
1912 
ValueCountFrequency (%) 
6001912100.0%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length3
Median length3
Mean length3
Min length3

height
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size14.9 KiB
600
1912 
ValueCountFrequency (%) 
6001912100.0%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length3
Median length3
Mean length3
Min length3

class
Categorical

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size14.9 KiB
missing knob
787 
missing insulator
389 
broken insulator
369 
rusty clamp
367 
ValueCountFrequency (%) 
missing knob78741.2%
 
missing insulator38920.3%
 
broken insulator36919.3%
 
rusty clamp36719.2%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length17
Median length12
Mean length13.59728033
Min length11

xmin
Real number (ℝ≥0)

Distinct520
Distinct (%)27.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean263.3948745
Minimum1
Maximum603
Zeros0
Zeros (%)0.0%
Memory size14.9 KiB

Quantile statistics

Minimum1
5-th percentile34.55
Q1149.75
median252
Q3392
95-th percentile506
Maximum603
Range602
Interquartile range (IQR)242.25

Descriptive statistics

Standard deviation147.0365292
Coefficient of variation (CV)0.5582361063
Kurtosis-0.998351046
Mean263.3948745
Median Absolute Deviation (MAD)115
Skewness0.1332249282
Sum503611
Variance21619.74091
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1160.8%
 
256110.6%
 
245100.5%
 
269100.5%
 
9090.5%
 
49790.5%
 
14790.5%
 
25390.5%
 
17290.5%
 
8290.5%
 
Other values (510)181194.7%
 
ValueCountFrequency (%) 
1160.8%
 
230.2%
 
340.2%
 
410.1%
 
520.1%
 
ValueCountFrequency (%) 
60310.1%
 
59910.1%
 
56710.1%
 
56110.1%
 
55810.1%
 

ymin
Real number (ℝ≥0)

Distinct523
Distinct (%)27.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean307.542364
Minimum1
Maximum613
Zeros0
Zeros (%)0.0%
Memory size14.9 KiB

Quantile statistics

Minimum1
5-th percentile59
Q1205
median300
Q3439
95-th percentile525
Maximum613
Range612
Interquartile range (IQR)234

Descriptive statistics

Standard deviation146.7867888
Coefficient of variation (CV)0.4772896549
Kurtosis-0.9356396286
Mean307.542364
Median Absolute Deviation (MAD)118.5
Skewness-0.1505527284
Sum588021
Variance21546.36136
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1170.9%
 
473130.7%
 
272110.6%
 
264110.6%
 
475110.6%
 
247110.6%
 
477110.6%
 
271110.6%
 
266110.6%
 
26890.5%
 
Other values (513)179693.9%
 
ValueCountFrequency (%) 
1170.9%
 
210.1%
 
330.2%
 
410.1%
 
620.1%
 
ValueCountFrequency (%) 
61310.1%
 
61110.1%
 
60010.1%
 
59610.1%
 
59210.1%
 

xmax
Real number (ℝ≥0)

Distinct510
Distinct (%)26.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean351.9027197
Minimum32
Maximum640
Zeros0
Zeros (%)0.0%
Memory size14.9 KiB

Quantile statistics

Minimum32
5-th percentile108
Q1227
median362
Q3475
95-th percentile577
Maximum640
Range608
Interquartile range (IQR)248

Descriptive statistics

Standard deviation147.6958454
Coefficient of variation (CV)0.4197064619
Kurtosis-1.029236854
Mean351.9027197
Median Absolute Deviation (MAD)124
Skewness-0.1575395354
Sum672838
Variance21814.06275
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
353110.6%
 
371110.6%
 
595100.5%
 
234100.5%
 
327100.5%
 
346100.5%
 
375100.5%
 
199100.5%
 
48190.5%
 
42390.5%
 
Other values (500)181294.8%
 
ValueCountFrequency (%) 
3210.1%
 
3510.1%
 
3710.1%
 
4410.1%
 
4710.1%
 
ValueCountFrequency (%) 
64050.3%
 
61310.1%
 
60080.4%
 
59910.1%
 
59830.2%
 

ymax
Real number (ℝ≥0)

Distinct483
Distinct (%)25.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean388.7588912
Minimum19
Maximum640
Zeros0
Zeros (%)0.0%
Memory size14.9 KiB

Quantile statistics

Minimum19
5-th percentile180.55
Q1306
median379
Q3492
95-th percentile585.45
Maximum640
Range621
Interquartile range (IQR)186

Descriptive statistics

Standard deviation123.5412588
Coefficient of variation (CV)0.3177837512
Kurtosis-0.5204190675
Mean388.7588912
Median Absolute Deviation (MAD)92
Skewness-0.1977835805
Sum743307
Variance15262.44262
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
347120.6%
 
352120.6%
 
499120.6%
 
372110.6%
 
357110.6%
 
295110.6%
 
488110.6%
 
519110.6%
 
419110.6%
 
308110.6%
 
Other values (473)179994.1%
 
ValueCountFrequency (%) 
1910.1%
 
2710.1%
 
3010.1%
 
3420.1%
 
5310.1%
 
ValueCountFrequency (%) 
64060.3%
 
62810.1%
 
62310.1%
 
62210.1%
 
61520.1%
 

counting
Boolean

CONSTANT
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size14.9 KiB
1
1912 
ValueCountFrequency (%) 
11912100.0%
 

width_obj
Real number (ℝ≥0)

HIGH CORRELATION

Distinct263
Distinct (%)13.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean88.50784519
Minimum19
Maximum500
Zeros0
Zeros (%)0.0%
Memory size14.9 KiB

Quantile statistics

Minimum19
5-th percentile35
Q150
median69
Q397
95-th percentile236
Maximum500
Range481
Interquartile range (IQR)47

Descriptive statistics

Standard deviation67.92371857
Coefficient of variation (CV)0.767431615
Kurtosis9.590652458
Mean88.50784519
Median Absolute Deviation (MAD)21
Skewness2.836652853
Sum169227
Variance4613.631545
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
66351.8%
 
48341.8%
 
70321.7%
 
52311.6%
 
67311.6%
 
69311.6%
 
59301.6%
 
50291.5%
 
41291.5%
 
61281.5%
 
Other values (253)160283.8%
 
ValueCountFrequency (%) 
1910.1%
 
2110.1%
 
2220.1%
 
2310.1%
 
2430.2%
 
ValueCountFrequency (%) 
50010.1%
 
49820.1%
 
48910.1%
 
47810.1%
 
46910.1%
 

height_obj
Real number (ℝ≥0)

HIGH CORRELATION

Distinct259
Distinct (%)13.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean81.2165272
Minimum15
Maximum512
Zeros0
Zeros (%)0.0%
Memory size14.9 KiB

Quantile statistics

Minimum15
5-th percentile26
Q139
median57
Q396
95-th percentile222.45
Maximum512
Range497
Interquartile range (IQR)57

Descriptive statistics

Standard deviation72.44846364
Coefficient of variation (CV)0.8920408954
Kurtosis9.799948163
Mean81.2165272
Median Absolute Deviation (MAD)23
Skewness2.864488295
Sum155286
Variance5248.779884
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
33382.0%
 
44371.9%
 
36371.9%
 
40361.9%
 
39361.9%
 
37361.9%
 
42351.8%
 
38351.8%
 
30341.8%
 
47331.7%
 
Other values (249)155581.3%
 
ValueCountFrequency (%) 
1520.1%
 
1620.1%
 
1770.4%
 
1840.2%
 
19100.5%
 
ValueCountFrequency (%) 
51210.1%
 
49810.1%
 
48710.1%
 
48010.1%
 
47410.1%
 

aspect_ratio
Real number (ℝ≥0)

Distinct1415
Distinct (%)74.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.287387278
Minimum0.4
Maximum5.277777778
Zeros0
Zeros (%)0.0%
Memory size14.9 KiB

Quantile statistics

Minimum0.4
5-th percentile0.6165617021
Q10.8284798535
median1.088279221
Q31.639508197
95-th percentile2.410549002
Maximum5.277777778
Range4.877777778
Interquartile range (IQR)0.8110283432

Descriptive statistics

Standard deviation0.606270237
Coefficient of variation (CV)0.4709307351
Kurtosis3.083955024
Mean1.287387278
Median Absolute Deviation (MAD)0.3531107944
Skewness1.332219108
Sum2461.484475
Variance0.3675636003
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1351.8%
 
1.333333333120.6%
 
2100.5%
 
1.57142857160.3%
 
0.666666666760.3%
 
0.7560.3%
 
1.08333333360.3%
 
1.12550.3%
 
1.05128205150.3%
 
0.714285714350.3%
 
Other values (1405)181695.0%
 
ValueCountFrequency (%) 
0.410.1%
 
0.407894736810.1%
 
0.408450704210.1%
 
0.438461538510.1%
 
0.440298507510.1%
 
ValueCountFrequency (%) 
5.27777777810.1%
 
5.16666666710.1%
 
5.05263157910.1%
 
4.11764705910.1%
 
4.08695652210.1%
 

area
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1432
Distinct (%)74.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11424.81747
Minimum345
Maximum256000
Zeros0
Zeros (%)0.0%
Memory size14.9 KiB

Quantile statistics

Minimum345
5-th percentile1080.55
Q12109
median3768.5
Q38551.5
95-th percentile54056.1
Maximum256000
Range255655
Interquartile range (IQR)6442.5

Descriptive statistics

Standard deviation26288.56658
Coefficient of variation (CV)2.301005391
Kurtosis29.6447425
Mean11424.81747
Median Absolute Deviation (MAD)2094.5
Skewness5.057784096
Sum21844251
Variance691088732.9
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
156070.4%
 
201660.3%
 
192560.3%
 
159960.3%
 
273050.3%
 
189050.3%
 
118850.3%
 
273650.3%
 
140450.3%
 
312050.3%
 
Other values (1422)185797.1%
 
ValueCountFrequency (%) 
34510.1%
 
36110.1%
 
43210.1%
 
44010.1%
 
45910.1%
 
ValueCountFrequency (%) 
25600010.1%
 
24800410.1%
 
22353310.1%
 
21756610.1%
 
21462210.1%
 

scale
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1432
Distinct (%)74.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1382907225
Minimum0.03095695937
Maximum0.8432740427
Zeros0
Zeros (%)0.0%
Memory size14.9 KiB

Quantile statistics

Minimum0.03095695937
5-th percentile0.05478619916
Q10.07653975002
median0.1023134888
Q30.1541238662
95-th percentile0.3874976871
Maximum0.8432740427
Range0.8123170833
Interquartile range (IQR)0.07758411613

Descriptive statistics

Standard deviation0.1123293349
Coefficient of variation (CV)0.8122694919
Kurtosis10.6469867
Mean0.1382907225
Median Absolute Deviation (MAD)0.03192606325
Skewness3.012836261
Sum264.4118613
Variance0.01261787947
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.0658280588670.4%
 
0.0666458300860.3%
 
0.0731247032360.3%
 
0.0748331477460.3%
 
0.0877496438750.3%
 
0.0674948557750.3%
 
0.0724568837350.3%
 
0.0624499799850.3%
 
0.0774596669250.3%
 
0.0574456264750.3%
 
Other values (1422)185797.1%
 
ValueCountFrequency (%) 
0.0309569593710.1%
 
0.0316666666710.1%
 
0.0346410161510.1%
 
0.0349602949410.1%
 
0.0357071421410.1%
 
ValueCountFrequency (%) 
0.843274042710.1%
 
0.8310.1%
 
0.787987944110.1%
 
0.777399511210.1%
 
0.77212189610.1%
 

Interactions

Correlations

Pearson\'s r

The Pearson\'s correlation coefficient (r) is a measure of linear correlation between two variables. It\'s value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman\'s ρ

The Spearman\'s rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson\'s r. It\'s value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall\'s τ

Similarly to Spearman\'s rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It\'s value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

filenamewidthheightclassxminyminxmaxymaxcountingwidth_objheight_objaspect_ratioareascale
0ABESAN0015_3.jpg600600broken insulator17313739527712221401.585714310800.293825
1ABESAN0054_2.jpg600600broken insulator21511937123711561181.322034184080.226127
2ABESAN0054_3.jpg600600broken insulator14615548141113352561.308594857600.488080
3ABESAN0075_2.jpg600600broken insulator18311943729612541771.435028449580.353388
4ABESAN0075_3.jpg600600broken insulator1688844830312802151.302326602000.408928
5ADIYAN0024_02.jpg600600missing insulator2341263332461991200.825000118800.181659
6ADIYAN0024_03.jpg600600missing insulator176146535012893490.8280801008610.529310
7ADIYAN0024_03_jpg.rf.429af7288fa432a34a1086ad4ec9df15.jpg600600missing insulator128150439913763980.9447241496480.644739
8ADIYAN0024_03_jpg.rf.8a34828000d4a41c0bacb3795be2f8d7.jpg600600missing insulator177158442414074230.9621751721610.691538
9ADIYAN0029_02.jpg600600missing insulator25419436831411141200.950000136800.194936

Last rows

filenamewidthheightclassxminyminxmaxymaxcountingwidth_objheight_objaspect_ratioareascale
1902SPINTEX0081_02.jpg600600missing insulator277199350276173770.94805256210.124956
1903SPINTEX0081_03.jpg600600missing insulator1773345032312732900.941379791700.468953
1904YIDI0081-1_2.jpg600600missing insulator1722362793221107861.24418692020.159878
1905YIDI0081-1_2_jpg.rf.24b3c12eeeb7cade70c6b7ddc131a3db.jpg600600missing insulator18321731833611351191.134454160650.211246
1906YIDI0081-2_2.jpg600600missing insulator20713939429111871521.230263284240.280990
1907YIDI0081-2_2_jpg.rf.4ef9593b4cb9f6b2b4d43a6e6e9c921a.jpg600600missing insulator22713046033512332051.136585477650.364253
1908YIDI0081-2_2_jpg.rf.ef05ec2d13b4cb08d939d8980554a344.jpg600600missing insulator22613045933512332051.136585477650.364253
1909YIDI0081-2_3.jpg600600missing insulator1778254739913703171.1671921172900.570794
1910YIDI0081-2_3_jpg.rf.c22ec6f8750b78c84df9c0a903ea8cc1.jpg600600missing insulator76147649014694291.0932402012010.747591
1911YIDI0081-2_3_jpg.rf.e8e1c0659412e16d51414343e92ab63f.jpg600600missing insulator1625764050614784491.0645882146220.772122
'